Preprocessing student grades data (pivoting)

Juan Andrés Cabral

We start with a CSV file containing student ID, exercise names, and grades. The goal is to transform this data into a more accessible format: one row per student, with columns representing each exercise and the corresponding grades. The initial format of our student grade data had one row for each exercise a student completed, which can make analysis cumbersome if we want a comprehensive view of each student's performance across all exercises.

We use the pivot_table function in pandas to accomplish this. The parameters we pass to this function are:

index=['ID']: This is the column we want to keep as the identifier for each row. In our case, we use the student's ID. columns='Exercise': This is the column we want to pivot. Each unique value in this column will become a new column in our pivoted table. values='Grade': This is the column that contains the values we want to fill in our new columns. In our case, this is the grade the student received. fill_value='0': This is the value that will be used to fill in any cell for which we don't have data. In our case, if a student didn't participate in an exercise, a '0' is placed instead. After the pivot, each row represents a unique student, and each exercise has its own column with the student's grade for that exercise. This format is much more convenient for analyzing and comparing student performance across different exercises.